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Abstract. The halting problem is undecidablc — but can it be solved for “most” inputs? 
This natural question was considered in a number of papers, in different settings. We revisit 
their results and show that most of them can be easily proven in a natural framework 
of optimal machines (considered in algorithmic information theory) using the notion of 
Kolmogorov complexity. 

We also consider some related questions about this framework and about asymptotic 
properties of the halting problem. In particular, we show that the fraction of terminating 
programs cannot have a limit, and all limit points are Martin-Lof random reals. We 
then consider mass problems of finding an approximate solution of halting problem and 
probabilistic algorithms for them, proving both positive and negative results. 

We consider the fraction of terminating programs that require a long time for termination, 
and describe this fraction using the busy beaver function. We also consider approximate 
versions of separation problems, and revisit Schnorr’s results about optimal numberings 
showing how they can be generalized. 


1. Introduction 

One of the most basics theorems of computability theory is that the halting problem is 
undecidable, i.e., there is no algorithm that, given a computation, says whether it terminates 
or not. A related result says that for some computations the termination statement is 
undecidable in Godel’s sense (neither provable nor refutable). Still, in many cases the 
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termination question is not that hard. It could be that the difficult cases are rare exceptions 
and that for most cases the answer can be obtained effectively (and perhaps even easily). 

This question, while natural, is difficult to formulate. For qualitative questions it does 
not matter which computational model or programming language we use in the formulation 
of the halting problem. Technically speaking, all reasonable formulations lead to m-complete 
computably enumerable sets, and all m-complete sets are computably isomorphic (Myhill 
isomorphism theorem, see, e.g., j 18 j ). However, for quantitative questions the choice of 
programming language is very important: it is easy to imagine some universal programming 
language for which most programs terminate (or hang) for some trivial reasons. 

One may try to fix some computational model or programming language. For example, 
we may consider Turing machines with a fixed alphabet, and then ask whether there exists 
an approximation algorithm for the halting problem, whose success rate among all machines 
with n states converges to 1 as n —> oo. This question, however, is sensitive to the details 
of the definition. For example, it was shown in m that for Turing machines with one-sided 
tape the success rate may converge to 1: speaking informally, this happens because most 
machines fall off the tape rather quickly. This argument, however, does not work for machines 
with a two-sided tape, for which the similar question remains open. 

Looking for more invariant statements, one should put some restrictions on the way the 
computations are encoded, and these restrictions could be quite technical. For example, in 
DUE] (see also [21) for a survey of these and some other results), numberings of computable 
functions are considered where each function occupies a H(l)-fraction of n-bit programs for 
all sufficiently large n, together with encodings for pairs that have some special property. 
However, in m a more natural requirement for the programming language (motivated by 
the algorithmic information theory) was already suggested. We show (Section [3]) that results 
about approximate algorithms for halting problem from DUE] remain true in this simple 
setting, and can be easily proved using Kolmogorov complexity. Moreover, they remain true 
for a weaker requirement than used in ESE!; we discuss this (and some related questions) 
in Section [2j 

The next three sections (Sections [4]-[6]) are devoted to different questions related to the 
halting problem and its approximate solutions. In Section [4] we consider the fraction of 
terminating programs among all programs of length at most n. We prove that this fraction 
has no limit as n —> oo, and limit points are Martin-Lof random reals (even relative to O'). 
We prove that the limsup of this fraction is an upper semicomputable O'-random number 
and every number of this type can appear as lim sup for some machine (Theorem |4.1[ ) . In 
Section [5] we consider the task “find an approximate solution of the halting problem” as a 
mass problem in Medvedev’s sense, and study whether different versions of this problem 
can be solved with positive probability by a randomized algorithm, obtaining both positive 
and negative results for different versions. In Section [6] we consider the following question: 
how long we need to wait until all terminating programs of size at most n, except for a 
given fraction, do terminate. This question has a natural answer in terms of the busy beaver 
function, and it can be easily proven using Kolmogorov complexity. 

Finally, in the last section (Section [ 7 ]) we discuss some generalizations of the previous 
results. 

We assume that the reader has some background in computability theory, algorithmic 
randomness and Kolmogorov complexity (see, e.g., PEI El E]). We denote by C plain 
Kolmogorov complexity and by K prefix-free Kolmogorov complexity. 
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2. Optimal and effectively optimal machines 

2.1. Definitions. The halting problem is described in different ways in different textbooks. 
Sometimes one considers the diagonal function ip x (x), i.e., asks whether a program terminates 
on itself. One can also ask whether a given program terminates on a given input, or whether 
a given program without input terminates. The latter version looks most suitable for us. 
Indeed, the diagonal function is considered mostly for historical reasons (Cantor’s diagonal 
argument and first proofs of undecidability; in these arguments we construct a function that 
differs from ip x somewhere, and use x as the difference point, but any other sequence of 
difference points could be used as well). For the second version (when we ask whether ^p x {y) 
is defined) we need to combine x and y into some input pair and measure the size of this 
pair, if we want to ask about the fraction of correct answers for inputs of given size — and 
in this way we get a function of one argument anyway. 

So we consider the halting problem for programs without inputs. Then the semantic of 
the programming language is described by an interpreter machine (algorithm, computable 
function) U ; its inputs and outputs are binary strings. Its inputs are called ‘programs’ 
and U{p) is the output of program p (undefined if p never terminates). We use the name 
machine for partial computable functions whose arguments and values are binary strings, 
and put the following restrictions. 

Definition 2.1. A machine U is universal if every other machine V is reducible to U in 
the following sense: there exists a total computable function h such that V(x) = U(h(x)) 
for every x (either both sides are undefined or they are defined and equal). 

Informally, this means that every other programming language V can be effectively 
translated into U (and h is the translator). In the following definition we additionally require 
that the length of programs does not increase significantly during the translation. We say 
that a (total) function h whose arguments and values are binary strings is length-bounded if 
it increases the length at most by a constant, i.e., \h{x)\ ^ \x\ + c for some c and all x. 

Definition 2.2. A machine U is effectively optimal if every other machine V is reducible 
to U by a total computable length-bounded function h. 

Remark 2.3. Effective optimality is an obvious strengthening of optimality (see next 
definition). It is important that the function h in the definition of an effective optimal 
machine is total: if we allowed the function h to be partial computable, the definition would 
be equivalent to that of optimal machine. Indeed, if U is optimal with some 0(l)-constant 
c, we can compute h(x) by effectively searching for an [/-description of V(x) having length 
at most |x| + c. 

A weaker requirement is used in the definition of Kolmogorov complexity where for each 
machine U we consider the complexity function Cjj(x) = min{|p|: U(p) = x}. 

Definition 2.4. A machine U is optimal if the complexity function Cjj is minimal up to 
0(1) additive term: for every other machine V there exists c such that Cjj{x) ^ Cy(x) + c 
for all x. 

It is easy to see that effectively optimal machines exist: take some universal function 
&(p,x) such that every machine appears among the functions <p p (x) = $>(p,x), and then let 
U((p)x) = <&(p,x) where (p) is some self-delimited encoding of p (say, all bits are doubled 



4 


L. BIENVENU, D. DESFONTAINES, AND A. SHEN 


and 01 is added at the end). It is also easy to see that every effectively optimal machine is 
optimal. The function Cjj for some fixed optimal U is called Kolmogorov complexity. 

The notion of an optimal machine was introduced and used by Solomonoff and Kol¬ 
mogorov (see EH for the historical account); the notion of an effectively optimal machine 
was introduced in [2D]. In this paper Schnorr used the name “optimal enumeration”; we use 
the name “effectively optimal” as a reminder that this is an effectivization of the optimality 
requirement used by Solomonoff and Kolmogorov. 

Remark 2.5. One can also require that the translator function h in the definition of an 
effectively optimal machine can be found effectively given a machine V. This does not give 
a stronger notion, however: every effectively optimal machine has this property. Indeed, 
the construction of an effectively optimal machine U given above guarantees this property 
if $ is a Godel universal function; then U can be reduced to arbitrary effectively optimal 
machine U', and this reduction guarantees the same property for U'. 

Schnorr noted in [20] that there exist optimal machines that are not effectively optimal. 
For example, the machine that keeps the last bit unchanged may be optimal (e.g., one may 
apply an optimal machine to the preceding bits) but cannot be effectively optimal and even 
universal. If it were, the translator function could be used to separate the computations that 
give output 0 and output 1 (the standard inseparable sets). There are many other examples 
of optimal but not effectively optimal machines. For example, one may consider an optimal 
machine where the empty string A has unique preimage A (all other programs returning A 
are suppressed). An interesting class of optimal machines that cannot be effectively optimal 
is given in the following proposition. 


Definition 2.6. A machine is called left-total if for every n the n-bit strings in its domain 
form an initial segment in the lexicographic ordering. 

Proposition 2.7. The exists a left-total optimal machine but no left-total machine can be 
effectively optimal. 

Proof. Every machine U can be converted to a left-total machine U' without changing the 
complexity function: when a new string p of length n appears in the domain of U, we add 
to the domain of U' the lexicographically-least string q of length n which is not in this 
domain, and set U'(q) = U(p). To prove that a left-total machine U cannot be effectively 
optimal, let us assume that U is left-total and effectively optimal and show that in this 
case for a given enumerable^] set W one can effectively construct a different enumerable set 
W', thus getting a contradiction with the fixed point theorem for enumerable sets. Indeed, 
the set IK is a domain of some function V that is reducible to U, so V(x) = U(h(x )) for 
some length-bounded computable total function h. Since h is length-bounded, we can find 
two strings u and v such that h{u) and h(y) have the same length. If, say, h{u) ^ h(v) 
in the lexicographic ordering, we know that v E W implies u E W, so the set W' = {f} 


is guaranteed to be different from W. Note that we used Remark 2.5 in this argument, 
since the transformation of W into W' should be effective to get a contradiction with the 
fixed-point theorem. □ 


^For brevity we use the name ‘enumerable’ for recursively (computably) enumerable sets. 
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Here are two other examples of optimal machines that cannot be effectively optimal: 

Proposition 2.8. 

(a) There exists an optimal machine whose domain is a simple set (in Post’s sense: the 
complement is infinite but does not contain an infinite enumerable set)-, such a machine 
cannot be effectively optimal. 

(b) There exists an optimal machine such that U(p) U(p') for every two different strings 
p,p' of the same length; such a machine cannot be effectively optimal. 


Proof. 

(a) To prove the existence of optimal machines with simple domains, consider first an 
optimal machine U whose domain contains at most half of the n-bit strings for each n. 
It can easily be constructed, say, by appending a zero to all arguments. Now we modify 
this machine by extending its domain, and get a machine V with the same complexity 
function and simple domain. To make the domain simple, for every enumerable set W n 
that contains strings of length greater than n, we add one of these strings to the domain 
of U. This is done in the following way. 

We simulate U for all arguments and enumerate all W n in parallel. When we discover 
that U(p) = x for some p that is not in the domain of V (yet), we let V(p) = x as well. 
When some string x of length greater than n appears in W n , we look whether V is 
already defined on this string. If yes, we do nothing and forget about W n . If not, we 
add this string to the domain of V with some nonsense value (say, the empty string) 
and again forget about W n . The only problem arises when U becomes defined on some 
element that was earlier added to the domain of V with nonsense value. Then, like 
a person in a concert hall whose seat is already occupied, the value U(p) is assigned 
to some free seat, i.e., to some unused argument of the same length. In this way 
the complexity function does not increase. Of course, later the legal owner of this 
new seat may arrive; then she is sent to some free seat of the same length, etc. Our 
assumption guarantees that we do not run out of places, since for strings of length N 


only Wi ,..., Wn~i could require seating at length N, and we have 2 


N—l 


free seats. 


This ends the existence proof. It remains to note that the domain of a universal machine 
is m-complete and therefore cannot be simple (see, e.g., 


(b) We may use the construction from the proof of Proposition 2.7 but omit the values that 
already appeared among the arguments of the same length. To show that a machine U 
with this property cannot be effectively optimal, consider the function U' defined as 
follows: 

o^l, if \x\ e P 
U'(x) = < x, if \x\ € Q-, 

undefined otherwise. 

where P and Q are enumerable inseparable sets of natural numbers (lengths). If h is a 
length-bounded computable total function that reduces U' to U , we can separate P 
and Q using the following observation: if n E Q, then h(x) should be different for all x 
of length n, and if n E P, then all h{x) are descriptions of the same string of length at 
most n + 0(1), so there is at most n + 0(1) different possible values for h(x). □ 


We finish this section with one last observation. One can consider the following property of 
a machine U that looks weaker than effective optimality: for every machine V there exists 
a total length-bounded computable function h such that U(h(x)) is equal to V(x) if V(x) is 
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defined, and U{h(x)) may be arbitrary if V{x) is undefined. In other words, we allow the 
translation of a non-terminating V-program to be a terminating [/-program; note that this is 
enough to conclude that Cjj{x) ^ Cy(x) + 0(1). In fact, this property is not weaker at all: 

Proposition 2.9. Every machine U with this property is effectively optimal. 


Proof. Let V be some other machine for which we want to find a length-bounded translator 
required by the definition of effective optimality. We know that for every machine V' there 
exist a “semi-translator” of V' to U , i.e., a length-bounded total computable function with 
the property described above. Similarly to Remark 2.5 we may assume that h can be found 
effectively given V'. Let us consider the following machine V' ; using the fixed-point theorem, 
we may assume that V' knows the semi-translator h of V' to U: 


V\x) 


V(x), if V(x) is defined; 

< something different from U(h(x)), if U(h(x)) is defined; 
undefined otherwise. 


This definition is understood as follows: On a given input x, V' computes V(x) and U{h{x)) 
in parallel until one of the two computations terminates; then the first or the second line is 
applied (and we do not care whether the other computation terminates, too). 

In fact, the second line is never used, since in this case V'{x) is defined and is different 
from U(h(x)), so h is not a semi-translator. So V is the same function as V and U(h(x)) 
is undefined if V'{x) = V{x) is undefined, so h is not only a semi-translator for V but also 
a translator. □ 


2.2. Domains of optimal and effectively optimal machines. In the sequel we consider 
the halting problem for the domains of optimal and effectively optimal machines (for most 
results optimality is enough, but not for all, as we will see). This motivates the following 
question: which (enumerable) sets are domains of optimal and effectively optimal machines? 

The answer to the first question was given in |5]; for the reader’s convenience we 
reproduce the proof here. 

Theorem 2.10 ([5]). A set S is a domain of an optimal machine if and only if the 
Kolmogorov complexity of the number of strings of length at most n in S is n — 0(1): 

C(ff{x: (|x| ^ n) Axe S }) = n — 0(1). (*) 

Note that we count the strings of length at most n in S, not exactly n; it is possible, say, 
that an optimal machine is undefined on all strings of even length. 

Proof. Let us prove first the “only if” part. Let H n be the cardinality of the set in question 
(for some optimal machine). Then C(H n \ n) ^ C{H n ) ^ n with 0(Imprecision since 
H n ^ 2 n+1 . To prove the reverse inequality C(H n ) ^ n — 0(1), and even the stronger 
inequality C(H n \ n) ^ n — 0(1), assume that we have a program q that maps n to H n and 
is d bits shorter than n. We may take an 0(logd)-bit self-delimiting description of d and 
append q to it; the resulting string allows us to reconstruct d, then q, then n = \q\+d, then 
H n = [q\{n). (Here by [g](n) we denote the output of program q on input n.) Then we find 
all strings of length at most n in the domain of the optimal machine, and a string z that has 
no description of size at most n. This gives C(z) > n and C(z) ^ O(logd) + n — d + 0(1) 
at the same time, so d = 0(1), and C(H n \n) ^ n — 0(1), therefore C(H n ) ^ n — 0(1). 
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To prove the other implication, assume that (*) is true. Fix some optimal machine U. 
For some c (to be chosen later) consider the following process. We enumerate the domain of 
U and the set S in parallel, and construct a new machine V. When some element x in the 
domain of U appears, we suspend the enumeration of the [/-domain and wait until some 
new string z of length at most |x| + c appears in S. Then we declare V(z) := U(x), and 
resume the enumeration of the domain of U (in parallel with the enumeration of S'). As 
soon as some other element x' in the [/-domain appears, we repeat this procedure, wait 
for z' of length at most \x'\ + c in S, declare V(z') := U(x'), etc. Note that elements in 
the enumeration of S may appear when nobody is waiting for them (e.g., if they are too 
long); in this case we make V(z) defined for those elements z immediately. The value is not 
important, for example, we may declare V(z) to be the empty string in this case. 

The domain of V is exactly S: each element of S is either used for some [/-value or 
wasted, but V is defined on it anyway. If for some c the waiting process is always successful 
(for every element in the domain of U some suitable z in S is found), the machine V 
is optimal. It remains to show that for large enough c we get a contradiction with our 
complexity assumption, if it is not the case. 

Assume that some x of length k in the domain of U was not matched. This means that 
after x is enumerated in our process, no element of length at most k + c appears in S. So 
the number of elements of length at most k + c in S can be reconstructed if we know x and 
c; note that k is determined by x, so we do not need to specify it separately. The pair (x, c) 
has complexity at most k + O(logc), so this number has complexity at most k + O(logc) 
too, and our assumption gives k + 0(log c) ^ k + c — 0( 1). This is possible only if c is small, 
and this finishes the proof. □ 

We now characterize the domains of effectively optimal machines. 

Definition 2.11. An enumerable set S is called m-optimal if every enumerable set W can 
be reduced to S by some length-bounded total computable function h: 

x e W <=> h(x) € S. 


Obviously, a domain of an effectively optimal machine is an m-optimal set, and, as we will 
prove soon (Corollary 2.13), the reverse is also true. 

If we omit the words “length-bounded” in Definition 2.11| we get the classical notion of 
a m-complete set. All m-complete sets are computably isomorphic (Myhill theorem, see, 
e.g., [IB]): if S and S' are two m-complete sets, there exists a computable bijection that 
maps S to S'. Using the arguments of Schnorr [20] . we can prove the following resultj^] 


Theorem 2.12 (“Myhill-Schnorr theorem”). Every two m-optimal sets S and S' are 
length-bounded isomorphic: there exists a computable bijection h that maps S to S' such 
that both h and /i -1 are length-bounded (so \h(x)\ = |x| + 0(1)). 


Corollary 2.13. Every m-optimal set is a domain of an effectively optimal machine. 

Indeed, a computable isomorphism between sets S and S' that is length-bounded in 
both directions, can be used to convert an effectively optimal machine with domain S into 
an effectively optimal machine with domain S'. 


2 Schnorr did not consider m -optimal sets. He proved in m that any two effectively optimal machines 
are reducible to each other by a computable bijection that is length-bounded in both directions. We can 
however use essentially the same argument to pro ve the corresponding result for m-optimal sets, so we use 
the name “Myhill-Schnorr theorem” for Theorem : 


2.12 
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Proof of MyhilESchnorr theorem. The proof goes in two steps. First (Lemma 2.14) we 


prove that the reduction in the definition of an m-optimal set S can be made injective. 


Then we use some general combinatorial statement (Lemma 2.15) to get a bidirectional 
length-bounded bijection from two length-bounded injections. 

It is convenient to identify strings (in length-lexicographic ordering) with natural 
numbers; then length-bounded functions become functions /: N —> N such that h(n ) = 0(n). 

Lemma 2.14. Let S be an m-optimal set, and let W be an arbitrary enumerable set. Then 
there exist a computable length-bounded injective total function h such that x G IT h(x) G 
S for every x. 


Proof of Lemma \2.1f . As before (see Remark 2.5) we may assume that a length-bounded 
total computable reduction of an arbitrary enumerable set IT to S' can be found effectively. 
Also, as in the proof of Proposition |2.9t we use the fixed-point theorem and construct an 
enumerable set W' assuming that a length-bounded total computable reduction h of IT' 
to S is known. The set IT' is defined as follows: to determine whether some number i 
belongs to IT', we check first whether h{i) appears among h( 0), h( 1),..., h(i — 1). If yes, 
then i does not belong to IT'. If not, we start two processes in parallel: we enumerate IT 
waiting until i appears in IT, and we compute h(i + 1), h(i + 2),... until h[i) appears in 
this list. If one of these two events happens, we include i into IT' (and i (j IT' otherwise). 

Let us show that in fact IT' = IT and h is injective (so the statement of the lemma 
is true). Assume that h(k ) = h(l) for k < l. Then l ^ IT' since h(l) appears in the list of 
previous values of h, and k G IT', since l appears in the list of subsequent values of h. But 
this cannot happen, since h reduces IT' to S and h{k) = h(l ) implies (k G IT') (l G IT'). 
Now we know that h is injective, and then the definition of IT' guarantees that IT' = IT. 
Lemma 2.14 is proven. □ 


Now we switch to the combinatorial part. 


Lemma 2.15. Consider two total injective functions f: Ni -> N 2 and g: N 2 — > Ni (here 
Ni = N 2 = N, and subscripts 1 and 2 are used just to show that we consider two copies Ni 
on the left and N 2 on the right). Consider the bipartite graph E C Ni x N 2 that combines 
edges of type (x,f(x)) and of type (g(y),y). There exists a bijection h: Ni —N 2 with the 
following properties: 

• the vertices connected by h (i.e., x and h(x)) belong to the same connected component 

of E; 

• h(i) is bounded by max{/(*') | i! ^ i}; 

• h~ l {j) is bounded by max{g(j') | j' ^ j}; 

If f and g are computable, h can be made computable. 


(In our application the functions / and g are reductions between two enumerable sets 
going in opposite directions, so either all the elements of the connected component belong 
to the corresponding sets, or all elements of the connected component do not belong to the 
corresponding sets.) 


Proof of Lemma 2.15\ We construct h step by step: at each stage we have a finite one-to-one 
correspondence between two finite subsets of Ni and N 2 ■ Then a new pair is added to the 
current h, and at the same time we modify / and g in such a way that / and g (the current 
versions of / and g) have the following properties: 


• / and g are injective; 
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• the current h is contained both in / and in g, meaning that for every pair u in the current 

domain of h, h(u ) = f(u) and for every v in the current co-domain of h, h~ l (v) = g(v ); 

• / and g connect vertices in the same connected component (in the initial graph), so 
the connected components of the new graph (for / and g) are parts of the connected 
components of the initial graph; 

• f(i) is bounded by max{/(* / ) | ^ *} (for the original /); 

• g(j) is bounded by ma x{g(f) \ f ^ j} (for the original g). 

The last two requirements may be reformulated as follows: changes in / and g may only 
decrease the quantities max(/(l),...,/(*)) and max(g(l),..., g(j)) for all i,j. So it is 
enough to check the non-increase of these two quantities for each change in / and g. 

At each step we take the minimal element on one of the sides that is not covered by 
current h, and add to h some pair that involves this element. Doing this alternatingly for 
both sides, we guarantee that the final h is a bijection. How is this done? For example, 
let u be the minimal left element not covered by current h. It is currently connected to 
some v = f{u). Then (1) u and v are in the same component, and (2) v is not covered by 
h. Indeed, if one of the endpoints of an /- or ( 7 -edge is covered by h, then the other one is 
covered by h, too (since h is included in / and g, and both / and g are injective). 

Case 1: g(v) = u. This case is simple: we add (u,v) to h, leaving / and g unchanged. 
Case 2: : w = g(y) 7 ^ u. We would like to let g(v) to be u, but need to be careful and 
consider several cases. Note first that w > u since w is not covered by h and all elements 
smaller than u are covered ( u is minimal). 



Case 2.1: u is not in the image of g. Then we let g(v) := u, and g is still an injection. 
We have decreased some value of g (recall that w = g(v) was greater than u ), so 
max(< 7 (l),... ,g(j )) could only decrease. Then we proceed as in case 1, i.e., add (■ u,v ) to 
h. 

Case 2.2: u = g(t ) for some t. Note that t 7 ^ v by injectivity of g, and we have four 
vertices u,v,w,t that are in the same component; none of which is covered by h (for 
the reasons explained above). We would like to exchange the values of g(t ) = u and 
g(v) = w and let g(t) := w, g(v) := u, coming again to case 1. The exchange keeps g 
injective and exchanged values are in the same component. So the only problem is that 
max(< 7 (l),... ,g(j)) should not increase. 

Case 2.2.1: t > v. In this case we exchange the values that were in the reversed order 
(t > v but g(t) = u < w = g(v)). Such an exchange can only decrease the quantity in 
question, so we come to case 1 after the exchange and add (u,v) to h. 

Case 2.2.2: t < v. In this case we give up changing g and change / instead. 
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Case 2.2.2.1: t is not in the image of /. Then we let f(u) := t as in case 2.1, decreasing 
the value of f(u), and add (u,t) to h. 

Case 2.2.2.2: t = f(s) for some s. Then s ^ u and is not covered by h, therefore s > u 
since u was minimal element not covered by h. All elements s,t,u,v are in the same 
component, are not covered by h, and the values v = f(u) and t = f(s ) go in the 
reversed order, so we exchange them by letting f(u) := t and f(s) := v, as in the case 
2 . 2 . 1 . 

This finishes the case analysis. 

This construction is not effective: why do we get a computable h if we cannot distinguish 
between cases where we need to make an exchange or not? This is not a problem, however. 
Note that we can check whether there exists some t < v such that gift) = u. Depending on 
this, we either perform the exchange for the values of g, or for the values of /. We do not 
know whether the exchange is really needed, but still can effectively write a program that 
takes into account this exchange when and if it turns out to be necessary (the necessity of 
exchange becomes obvious when we come to the stage when the corresponding value needs 
to be computed). □ 

By Lemma |2. 14 and Lemma 2.15, the Myhill-Schnorr theorem is proven. □ 

So we have studied two conditions for a machine: optimality and effective optimality. 
The second is stronger; it makes the domain of the machine m-optimal and therefore 
determines it uniquely up to a computable length-preserving (up to 0(1) additive term) 
bijection. For most of the results in the next sections we will only need optimality, and use 


the criterion provided by Theorem 2.10 


3. Approximate algorithms for halting problem 


Now we consider the halting problem as the decision problem for the domain of some fixed 


optimal machine U. Theorem 2.10 guarantees that this problem is undecidable (otherwise 
the complexity C(H n ) would be logarithmic in n). So there is no algorithm that correctly 
answers all questions whether x belongs to the domain of U or not. 

For every algorithm A (in this section we understand “algorithm” as partial computable 
function) we consider the set of inputs where A errs (as a decision algorithm for halting 
problem), i.e., the set of strings x such that x E dom (U) and A{x) does not output 1, 
or i ^ dom(C7) and A(x) does not output 0. Note that there are no restrictions for 
A yet. Later we will consider algorithms that are total, but may give incorrect answers 
(coarse computation), and also non-total algorithms that never give wrong answers ( generic 
computation). 

For each n we consider the fraction e n (A) of errors among all strings of length at most 
n (counting both places where A is undefined, and places where it produces a wrong answer). 
We are interested in the asymptotic behavior of e n (A) as n —> oo (since A always can be 
adjusted on a finite set). The sequence e n may not converge, so we consider two quantities 
E(A) = limsupe n (A) and E(A) = lirri inf £ n (A). In terms of these quantities, we may 
describe the behavior of the error rate as follows: 


Theorem 3.1. 

(a) E(A) > 0 for every A; 

(b) there exists £ > 0 such that E(A) ^ e for every A; 
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(c) for every e > 0 there exists A such that E(A) ^ e; 

(d) if we consider only algorithms that may he undefined but never produce wrong answers, 
then the statement ‘for every e > 0 there exists an algorithm A of this type such that 
E(A) ^ e ” may be true or false depending on the choice of the universal machine U ; 
for example, it always holds when U is left-total and never holds when U is effectively 
optimal. 


These results (in a different setting and with additional restrictions) were proved in }l5l [191 1T2] ; 
we show that they can be easily proven using Kolmogorov complexity. 


Proof, (a) Knowing n and some upper bound 2 n ~ d for the number of errors (of both types: 
A is undefined or the value is wrong) that A makes for strings of length at most n, we wait 
until A becomes defined on all strings of those lengths except for 2 n ~ d ones. Then we count 
the number of positive answers; it differs by at most 0(2 n ~ d ) from H n , the number of strings 
of length at most n in the domain of U. The difference can be specified by n — d + 0(1) 
bits, so the complexity C(H n \ n ) is bounded by K(d\ n) + (n — d) + 0(1), where K(d\n) 
in the prefix complexity of d given n, the minimal length of self-delimiting program that 
maps n to d. Note that the 0(l)-constant depends on A. The bound C(H n \ n) ^ n — 0(1) 
(see Theorem 2.10 and its proof) then implies that d — K(d\n) ^ 0(1), so d = 0(1). This 
provides the required bound 2 ~°^ for the fraction of errors for all large enough n. 

(b) The difference with (a) is that now the threshold for the number of errors should be 
the same for all approximation algorithms (while in (a) each algorithm has its own threshold). 
On the other hand, we only need to show that the fraction of errors is above the threshold 
infinitely often (and not necessarily for all large enough n). So the same argument with 
small modification works. It can be used with A as a parameter to prove that 


C(H n \n) ^ K (d | n) + (n — d) + K (A \ n) + 0(1), 

if the number of errors is bounded by 2 n ~ d , and now 0(1) does not depend on A. It remains 
to note that for some c and every A there exist infinitely many n such that K(A\n) ^ c 
(consider n whose binary representation starts with a self-delimiting program for A). We 
now get (in the same way as before) the uniform lower bound for the number of errors that 
arbitrary algorithm A makes (but only for n that make A simple). 

(c) For each n consider p n , the fraction of strings of length at most n that belong to 
the domain of U. We will see later that p n does not converge, but we may still consider 
p = lirri sup n p n . Now consider a rational number r that is smaller than p but is very close 
to it. There are infinitely many lengths for which the fraction of terminating computations 
exceeds r, and these lengths can be discovered ultimately. We may find a computable 
increasing sequence of lengths having this property. Consider some length n in this sequence. 
Imagine that we run U for all inputs of length at most n until we get a fraction r of 
terminating computations (i.e., corresponding inputs for U). This will happen at some point 
(according to the construction of the sequence). If we give positive answers for these inputs 
and negative answers for all others, the error rate is small if n is sufficiently large. Indeed, 
since limsup p n = p, the value of p n cannot exceed p significantly, and r is close to p, so we 
have found all positive answers except for a small fraction. This works for each large enough 
n from the sequence, but different n can lead to different answers for the same input. Let 
us assume that the lengths in the sequence increase fast enough (this can be done without 
loss of generality) and agree that we use the minimal length in the sequence that covers our 
input. If ni and nj+i are neighbor elements of the sequence, then m +1 is used for all strings 
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of length between m and n,j+i, and strings of length at most Uj form a negligible fraction 
among strings of length at most rii+i . 

(d) The claim here consists of two parts — positive and negative. For the positive 
part we need to show that for a left-total machine one can construct an algorithm that 
never provides incorrect answers and for infinitely many to terminates on most strings of 
length at most n (to be exact, on more than an (1 — e)-fraction of them). Let us explain 
informally why left-totality helps. If we know how many strings of given length n appear 
in the left-total set, we know which strings are there (initial segment of this length). If we 
have some lower and upper bounds for this number that differ by some small k, we can 
provide correct answers for all strings except for k strings in the middle. However, this 
argument works only for strings of length exactly n, and not for strings of length at most to, 
so arguments used to prove (c) do not work now and should be changed. 

For each to let us consider the fraction r n of strings of length exactly n that belong to 
dorn (U). As before, there exists some t = limsupr n . Consider two rational numbers r < r 
and r' > t close to each other. According to the definition of limsup, we have r n < r' for 
all to greater than some N ; there are infinitely many n > N such that r n > r. The values 
of to such that r n > r can be enumerated. Therefore, there exists an infinite computable 
increasing sequence toi < n -2 < n .3 < ... of integers such that for each i the fraction r ni of 
TOj-bit strings that belong to dom([7) is between r and r'. The difference between r and r' 
can be made arbitrarily small (though the values of r, r ', and N cannot be found effectively; 
they just exist). So we can give correct answers for strings of length m without errors and 
with few omissions, saying “yes” for the bottom r-fraction and “no” for top (1 — r^-fraction. 

This is not enough for us, since in our definition the fraction of prediction failures is 
calculated in the set of all strings of length at most to, and even if we know everything for 
TO-bit strings, this covers only half of the strings in question. So in this way we cannot make 
the error less than 1 / 2 . 

But we can repeat the trick: consider the lengths n\ — 1, TO 2 — 1, TO 3 — 1,... and the 
fractions r ni _ 1 of strings of these lengths that belong to dom(I/). The sequence r ni _i again 
has some limsup, and one can find rational numbers s < s' that are close to each other, 
and some computable subsequence m\ < m 2 < m 3 < ... of the sequence to* such that 
T mi -1 is always between s and s'. In this way we can provide correct answers for almost all 
strings of two subsequent lengths rn t — 1 and mi, thus reducing the error from 1/2 to 1/4 
(approximately). 

This construction can be repeated once more, giving error rate close to 1/8 (for lengths 
in some computable subsequence), etc. Repeating this trick finitely many times, we get 
arbitrarily small error. Note that for this we need only finitely many bits of advice, so this 
still gives a partial computable predictor. The statement (d) is proven. 

For the negative part of (d), when U is effectively optimal, we follow Lynch [15] : 

Lemma 3.2. There exists a sparse simple set S. 

Recall that a simple set is an enumerable set whose complement is infinite but does not 
contain an infinite enumerable subset; sparsity means that the fraction of strings of length 
to that belong to S tends to 0 as to —> 00 . 

Proof of Lemma \3.S\ Let S be the set of highly compressible strings, i.e., strings x such 
that C(x) < |x|/ 2 . There is at most 0( 2 n,//2 ) compressible strings of length to, so S is sparse. 
If the complement of S contains an infinite enumerable subset, then for every N we can 
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enumerate this set until we get a string of length greater than 2 N and complexity greater 
than N, and this string has complexity only O(logiV) — a contradiction. M 

An algorithm that tries to decide a sparse simple set S and is not allowed to give wrong 
answers can be defined only on a set of density 0: it can give positive answers only for 
elements of S, and they are rare; on the other hand, it can give only finitely many negative 
answers since S is simple. 

As we have seen, every enumerable set is reducible to the m-optimal set dom (U) by 
a length-bounded total computable injection. Consider such a reduction for the sparse 
simple set S. This reduction can be used to convert an approximate algorithm for dom ( U ) 
without errors to an approximate algorithm for S without errors, and the latter algorithm 
is undefined almost everywhere. Since the reduction is injective and length-bounded, the 
image has positive lower density, and it becomes a lower bound for E(A ) for every algorithm 
A that does not make errors. d 


Our definition of error rate considers all strings of length at most n to have equal 
importance. Thus it can be interpreted in a probabilistic fashion: if A is an approximate 
algorithm, the probability that A makes an error on a p chosen uniformly among strings 
of length n is > e for some e > 0 independent on n. It is natural to combine this implicit 
randomness with explicit randomness in the algorithm, i.e., allow A to be a probabilistic 
algorithm itself. For a randomized algorithm A (that uses fair coin tossing) we define e n (A) 
as the probability of error on a random input uniformly distributed in the set of all strings of 
length at most n. (We assume, as usual, that the random bits used in the computation are 
independent of the random choice of an input string.) Then E(A) and E(A) are defined in 
the same way as before. We show now that the negative results of Theorem 3.1 (a) and (b) 
remain valid for this setting; note that the algorithms without errors mentioned in (d) do not 
make much sense in this probabilistic setting (if a probabilistic algorithm never gives wrong 
answers, it can be replaced by a deterministic search over all possible values of random bits). 


Theorem 3.3. For probabilistic algorithms we still have: (a) E(A) > 0 for every A; (b) 
there exists e > 0 such that E(A) ^ e for every A. 

Proof. Knowing n and some upper bound 2~ d for the probability of error on strings of 
length at most n, we may emulate the behavior of A on all inputs and for all combinations 
of random bits until the probability not to get an answer (correct or incorrect) goes below 
2~ d . In other words, for every string x of length at most n we have 1 = p(x) + n(x) + u(x), 
where p(x), n(x), and u(x) are probabilities of positive answer, negative answer, and no 
answer respectively. The simulation provides better and better lower bounds for p and n, 
and upper bounds for u, and we wait until the average of the upper bounds for u(x) (taken 
over all strings of length at most n) goes below 2~ d . (This must happen since we assumed 
the probability of error to be < 2~ d .) 

After that we note that p{x) (sum of current approximations taken over all strings of 
length at most n) provides the approximation for the number of strings of length at most n 
that belong to dom (U) (assuming that the error probability is indeed bounded by 2 ~ d ). One 
can also use ]0(1 — n i x ))i both quantities are 0(2" -d )-close to this number. Indeed, let us 
consider the first quantity: H n — 'ff /x p(x) can be written as X^(Xdom(£/)( x ) — p(x)), where 
Xdom (u) i s the indicator function for dom ([/), and the sum is taken over all x of length 
at most n. The latter sum can be bounded by Yh x IXdom([/)( x ) ~ p( x ) I- Each summand 
is bounded by a probability of failure for specific x: for x G dom (U) it is exactly the 
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probability of failure, for x (f dom (U) it is smaller (since the algorithm may produce no 
answer for this x). We know by assumption that the average value of the probability of 
failure is less that 2~ d , thus the sum ffx |Xdom(c/)( x ) ~ p( x )\ is bounded by 2 n ~ d . 
Therefore, we get the upper bound 

n ^ C(H n | n) ^ n — d + 0(K (d| n)) = n — (d — O(logd)) 

(we omit 0(1)-terms), therefore d — O(logd) = 0(1) and d = 0(1) as required by (a). Here 
the 0(1) term depend on A; if A is not fixed, we get the inequality 

d — O(logd) ^ K{A\ n) + 0(1), 

which gives (b) since K(A\n) ^ 0(1) for infinitely many n. □ 


4. Halting rate and algorithmic randomness 


In this section we consider a different question: instead of deciding the halting problem we 
study just the fraction of inputs where the optimal machine is defined, and its asymptotic 
behavior. We consider some optimal machine U. Let H n be the number of inputs of length 
at most n on which U halts, and let p^ be the fraction of inputs of length at most n on 
which U halts. More precisely, we define p„ = H n / 2 n+1 ; technically we should put 2 n+1 — 1 
in the denominator, but we ignore this small difference to simplify the notation. As we are 
only interested in the asymptotic behavior of p^ this makes no difference. We fix some 
optimal machine U and write p n instead of p^ when this machine U is considered. 

Theorem 4.1. 

(a) For every computable sequence r n of rational numbers the difference \p n — r n \ is separated 
from 0 (i.e., is greater than e for some e > 0 and almost all n ). 

(b) The sequence p n does not converge (as n —> oo). 

(c) All limit points of p n are Martin-Lof random relative to O'. 

(d) The limsup of p n is upper semicomputable relative to O' ( and Martin-Lof random 
relative to O', see the previous claim). 

(e) The converse holds : every real in [0,1] that is upper semicomputable relatively to 0' and 
Martin-Lof random relative to 0' is the lim sup of p% for some optimal machine V. 
The machine V can be made effectively optimal, too. 


Note that (a) - (d) are true for every optimal U (it is easy, for example, to construct an 
optimal U such that corresponding sequence p n does not converge, but we claim that this 
happens for arbitrary optimal U). 


Proof, (a) We use the same complexity bound C(H n \n) ^ n — 0(1) for the number H n of 
strings of length at most n in the domain of U (see Theorem |2.10| and its proof). If the 
difference between r n and p n does not exceed 2 ~ d , then 

C(H n | n) ^ K(d\n) + (n — d) + 0(1) 


(to specify H n given n we provide a self-delimited program for d and attach n — d +0(1)-bits 
for the difference between r n 2 n+1 and H n ), which gives d = 0 ( 1 ). 

(b) This is a simple corollary of (a): consider a computable sequence r n for which all 
the points in [0,1] are limit points. If p n converges to some p, then p n is close to p for all 
large n, and r n is close to p infinitely often, and we get a contradiction with (a). 
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(c) For this proof we need to use a theorem by Miller m (see also [ 2 ] for a simple 
proof): a real number x E [0,1] is Martin-Lof random relative to O' if and only if, when 
viewing x as a bit sequence, there is a constant c such that for every prefix a of x, there is 
a finite string r extending a such that C(r) > |r| — c. 

Suppose x is a limit point of p n . First note that x cannot be a rational number 
(otherwise the constant sequence r n = x approximates p n ), so x has a unique binary 
representation. Let a be a prefix of x and let k be the length of a. Split [0,1] into 2 k 
equal intervals of size 2~ k . Then x is strictly inside one of these intervals (this interval 
consists of all binary extensions of a). Since x is a limit point, some p n also belongs to this 
interval. Recall that p n is a binary fraction H n /2 n+1 (here it is important that we use this 
denominator, not 2 n+1 — 1; of course, this does not change the limit points). Therefore, 
H n (considered as a string of length n + 1 with leading zeros) is an extension of a, and 
C(H n ) ^ \H n \ — 0(1) due to Theorem 2.10, so it remains to use Miller’s result. 

Item (c) is proven. We do not know whether the converse holds, i.e., whether any real 
that is Martin-Lof random relative to O' is a limit point of some sequence p^ for some 
optimal machine V. However, in (d) we give a full characterization of the reals that are 
limsup’s of those sequences. 

(d) Let us prepare ourselves by considering a simpler question. Assume that X is an 
arbitrary enumerable set, i.e., the domain of some machine, not necessarily an optimal one; 
x n is the number of strings of length exactly n in X , and X n is the number of strings of 
length at most n in X (so X n = xo + ... + x n ). Consider the upper density of X , i.e., 
limsup X n /2 n+1 . Which reals can appear as upper densities of enumerable sets? 

Lemma 4.2. A real number x in [0,1] is the upper density of some enumerable set X if 
and only if x is upper semicomputable relative to (L. 


Proof of Lemma f.2. In one direction: u n = X n /2 n+1 is a uniformly lower semicomputable 
sequence of reals, and one can easily show that the lim sup of such a sequence is upper 
semicomputable relative to Oh Indeed, lim sup u n < r means that there exist r' < r and 
some N such that u n ^ r' for all n ^ IV, so we get a 3V-property or r that corresponds to 
a Cf-enumerable set of rational numbers. 

Reverse direction: assume that x is upper semicomputable relative to Oh It is known 
that in this case x can be represented as lim sup k n for some computable sequence k n of 
rational numbers (see m or [22]). Then x = lim n v n , where v n = sup (k n , k n+ i,. ..) form a 
uniformly lower semicomputable sequence. We may assume without loss of generality that 
v n E [0,1] (since the limit is in [0,1]) and that v n are rational numbers with denominator 
2 n (by rounding; note that the resulting sequence v n may not be computable, only lower 
semicomputable). Then we consider an enumerable set X that contains exactly v n 2 n strings 
of length n (here we use that v n are lower semicomputable). It is easy to see that the upper 
density of A is x; in fact, the density (the limit, not only lim sup) exists and is equal to x, 
since the fraction of n-bit strings in X converges to x as n —> oo. d 

It remains to show that for x that are not only upper semicomputable relative to 0 ; 
but also Martin-Lof random relative to O', the set X can be made a domain of an optimal 
machine. Our next step is the following simple observation. 


Lemma 4.3. If some real x E [0,1] is the upper density of the domain of some optimal 
machine, the same is true for x /2 and (1 + x)/ 2 . 

(In terms of binary representation x/2 is Ox, and (1 + x)/2 is lx.) 
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Proof. For i/2 we just “shift” the domain of the optimal machine by adding leading 0’s to 
all the arguments. For (1 + x)/2 we do the same and also add all strings starting with 1 to 
the domain (with arbitrary values, e.g., they all can be mapped to the empty string). In 
both cases the machine remains optimal, the complexity increases only by 1. !_] 


Deleting the first bit preserves randomness and semicomputability, so we may assume 
without loss of generality that x (which is random and upper semicomputable relative to 
(F) is smaller than 1/2 (starts with 0), and then apply Lemma 4.3 to add leading l’s. 

Now we are ready to use another known result: every random upper semicomputable x is 
Solovay complete among upper semicomputable reals (all properties are considered relative to 
(F); according to one of the equivalent definitions of Solovay completeness, this means that 
for every other upper semicomputable (relative to (F) y and for any large enough integer N 
there exists another upper semicomputable (relative to O') z such that x = y/N + z. 

This result combines the work of Calude et al. [Sj and Kucera-Slaman [13] (see [3] for 
a survey of these results). Technically these papers consider lower semicomputable reals 
instead of upper semicomputable ones, but randomness is stable under sign change, so this 
does not matter. Also we need a relativized version of their result; as usual, relativization is 
straightforward. 

So let us assume that x € (0,1/2) and x = y/2 d +z where y is the upper density for some 
optimal machine U and 2 is upper semicomputable relative to (F. (The large denominator 


N is chosen to be a power of 2.) Now we combine two tricks used for Lemmas 4.2 and 4.3 


Namely, we apply Lemma 4.2 to 2 z (note that z < 1/2), and then add leading l’s to all the 
strings in the corresponding set. This gives us density z while using only right half of the 
binary tree (strings that start with 1). Then we add d zeros to all strings in the domain 
of U as we did when proving Lemma |4.3[ note that d ^ 1 since x < 1/2. This gives us 
density y/2 d using only the left half of the binary tree (actually, a small part of it, if d is 
large). Then we combine both parts and get a machine V that is optimal (since the left 
part is optimal) and has upper density y/2 d + z as required. (Note that in general lirnsup 
is not additive, but in our case we have not only lirnsup, but limit in one of the parts, so 
additivity holds.) 

If we start with an effectively optimal machine U , the machine V will also be effectively 
optimal (for obvious reasons). 

is proven. □ 


Theorem 4.1 


5. Approximations as mass problems 


The approach to probabilistic computations used in Theorem 3.3 looks natural from a 
computer science perspective. However, computability theorists would probably prefer 
another approach inspired by the notion of mass problems. A mass problem is a set of total 
functions N —> N (in other words, a subset of the Baire space N n of integer sequences). Its 
elements are called solutions of this problem. A problem is solvable if it has at least one 
computable solution; a problem A is reducible to B if there is an oracle machine F Y such 
that Y x 6 A whenever X € B. The full name of this reducibility is Medvedev reducibility 
or strong reducibility (compared to weak reducibility , or Muchnik reducibility , where the 
oracle machine T may depend on X £ B). Many people studied the corresponding degree 
structure, called Medvedev lattice. 
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It is interesting to study mass problems from the viewpoint of probabilistic computations. 
For example, one may ask whether a mass problem can be solved with probability 1 by 
some randomized algorithm (such an algorithm, given random oracle, produces a solution 
with probability 1). One may also ask whether a mass problem can be solved with positive 
probability. (Both properties are downward closed with respect to strong reducibility; the 
second one is closed also with respect to weak reducibility.) 

The notion of a mass problem allows us to reformulate the results of Section [3] in the 
following way. Let A be a set that we want to decide (approximately). If a is a total 
function, we may measure how well a approximates the characteristic (indicator) function 
Xa of A . For a given n we consider e n , the fraction of strings of length at most n where a 
and xa differ, and then we may define E(a,A ) = lim sup z n and E(a,A) = lim inf e n . Now 
we may consider, for a given set A, the mass problem 

C(A) = {a : E(a,A) = 0}; 


its solutions are total functions that compute xa on a set of density 1. (As usual, we identify 
binary strings with integers, so a is both a function defined on strings and a sequence as 
required by the definition of mass problems. The density of a set X of strings is defined 
as the limit of the fraction of strings of length at most n that belong to X. If this 
limit does not exist, we may still speak about lower density and upper density defined as 
lim inf x n and lim sup x n . ) 

If C{A) has computable solutions, the set A is called coarsely computable. Theorem |3.1 
implies that the domain of an optimal machine is not coarsely computable. Moreover, the 
statement (a) of this theorem guarantees that an (easier) mass problem {a : E(a, A) = 0} 
has no computable solution either. 

Here the difference with the probabilistic setting becomes apparent: while the first 
problem C(A) cannot be solved by a probabilistic algorithm with positive probability (see 
below Theorem 5.3), the second one can, as the following result shows: 


Theorem 5.1. For every enumerable set A there exists a randomized algorithm that with 
positive probability computes a total function a such that E(a,A) = 0. 


In other terms, there exists an oracle machine T such that the set of oracles X for which 
T x computes a total function a such that E(a, A) = 0 has positive probability according 
to the uniform Bernoulli distribution on the Cantor space (corresponding to independent 
fair coin tosses). 

Proof. To make the proof shorter, we use known results about generic sequences. A 
sequence G in the Baire space is called generic (more precisely, 1-generic) if for every 
effectively open set U it is contained either in U or in the interior part of the complement 
of U. In other terms, for every enumerable set W of finite sequences of integers, either G 
has a prefix in IT, or G has a prefix that has no extensions in W. It is easy to see (using 
the Baire category theorem) that generic sequences exists. Moreover, as Kurtz has shown 
(see |9, Theorem 8.21.3]), generic sequences can be generated by a randomized algorithm 
with positive probability. So it remains to prove the following lemma: 


Lemma 5.2. There exists an oracle machine T such that for every generic G the machine 
\k with oracle G computes a total function a with E(a,A ) = 0. 

Proof of the Lemma \5.£\ The function a is constructed as follows: we split all the possible 
lengths into consecutive intervals by thresholds no < n± < n 2 < ■ ■ ■, and for lengths in 



18 


L. BIENVENU, D. DESFONTAINES, AND A. SHEN 


[nj_i,rtj) we run the enumeration of A for IVj steps (and use the resulting part of A for a). 
Here n* and N are parameters that are taken from the generic sequence G; we interpret 
the nth term G n as an encoding of some pair (n*, Ni). 

Why does this help? Let us prove that for infinitely many values of n the fraction of 
errors among strings of length at most n is small. Let p n be the fraction of strings of length 
at most n that belong to A , and let p be the lim sup of p n . Consider a rational number r 
that is strictly smaller than p but very close to it. Whatever the number no,..., n,_i and 
No ,..., iVi-i (used in the construction of a) are, it is always possible to choose n* and Ni 
in such a way that the density of a among strings of length at most n* exceeds r. This 
inequality, considered as a property of finite sequence (no, No), ..., (m, Nf), is (computably) 
enumerable, so we get a dense effectively open set. According to the definition of genericity, 
the sequence G should have prefix in this effectively open set (its complement has an empty 
interior). The same argument can be applied to sequences of lengths greater than some 
threshold, so we conclude that for generic G the corresponding sequence a has infinitely 
many prefixes for which the error density e n is small. 

There is a small technical detail in this argument: we need to ensure that n* ni- 1 , so 
the strings of length at most rq_ i form an asymptotically negligible fraction in the set of all 
strings of length at most m. But this is easy to achieve: we may require, for example, that 


m > ni-1 + i. 


Theorem 5.1 


is proven. 


□ 

□ 


The statement (b) of Theorem 3.1 remains true for randomized algorithms. Consider 
again an optimal machine U and its domain dom([7). 


Theorem 5.3. There exists some e > 0 such that no randomized algorithm computes a 
with E(a, dom (U)) < e with positive probability. 


The value of e with this property depends on the choice of the optimal machine U. In 
this statement we may allow a to be non-total (and consider the places where a is undefined, 
as error points). This makes the interpretation of this result in terms of mass problems a 
bit more difficult, since mass problems are (by definition) sets of total functions. But it 
is nevertheless possible: we can consider the mass problem of enumerating the graph of a 
partial function a that approximates dom(t/). For the proof however, this interpretation is 
not needed, and we do not go into details. 


Proof. For this proof we need to extend the inequality used to prove Theorem 3.1 


Lemma 5.4. Consider some oracle machine T with random oracle X that tries to decide 
dom (U). For some length n and for some threshold d consider the event: “the fraction 
of errors made by T A on strings of length at most n is at most 2~ d let p(T, n, d) be its 
probability. Then 


K(H n | n) ^ 0(log d) + (n 


d) + K(T | n) + log 


1 

p(T,n,d) 


where H n is the number of strings of length at most n in dom (U). 


For deterministic algorithms (and p = 1) we have seen this bound earlier in the proof of 
Theorem |3.1| with C(H n \ n ) instead for K ( H n | n) (we did not need K{H n \ n ) there, but 


the same argument would work for K too). Now the prefix-free version of complexity is 
technically convenient since it is better adapted to oracles and probability. 
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Proof of Lemma 5.f. Let X be a “good” value of the oracle for which this event happens. 


Assume that we know n and d is given as advice. Then we wait until the machine T A 
provides answers for all strings of length at most n except for a 2 ~ d fraction, and count the 
number of positive answers. This number differs from H n by 0(2 n ~ d ), and the difference 
can be specified by n — d + 0(1) bits (in a self-delimited way when n and d are known, so 
we can put prefix complexity K in the left-hand side). So we get 

K x (H n \n) ^ O(logd) + (n — d) + K(T \ n) 


(complexity is relativized by X, the length n is given as a condition, and a self-delimiting 
description of d uses only O(logd) bits). It remains to apply a general statement about 
complexity: if for fixed a, b, k and for random X the inequality K x {a\b ) ^ k holds with 
probability at least p, then K(a\b) ^ k + log | + 0(1). This statement can be easily proven 
by switching to a priori probabilities and taking the average over random oracles X. !_] 


To prove the theorem, we need to find e > 0 such that no randomized algorithm can 
compute a such that E(a, dom ([/)) < e with positive probability. First of all we note that 
the Lebesgue density theorem implies that every set of positive measure forms a majority in 
some interval, and we can consider only oracles from this interval. So we can replace “with 
positive probability” by “with probability greater than 1/2” without loss of generality. (Any 
other constant instead of 1 /2 will work, too, but it is important to have some fixed constant 
since the choice of e should not depend on the success probability.) 

Assume that for some d and for some oracle machine T A with random oracle X the 
event “limsup of error rate of T A is less than 2~ dv has probability at least 1/2. For each 
n there is a set T n of oracles that give an error rate of at most 2~ d on strings of length 
at most n. The event mentioned above (that has probability at least 1/2) is included in 
lim inf T n = Uat n n ^jv T n . If this event has probability at least 1/2, then some intersection 
n nS »ivT n has probability at least 1/3. This implies that probabilities of T n for all sufficiently 
large n are at least 1/3. Applying Lemma [5~4| and the inequality K (H n \ n) ^ C(H n \n) ^ n 
(that holds with 0(1)-precision), we conclude that d — O(logd) ^ K(T \ n) + 0(1) for 
sufficiently large n, where the constant in 0(1) does not depend on T and n. Now, knowing 
this constant and recalling that for every T the complexity K(T\n) is 0(1) for infinitely 
many to, we may choose e small enough, and finish the proof of Theorem |5.3| □ 

For the last result in this section we should recall two paradigms of approximate com¬ 
putation that have received a lot of attention in the recent literature, the so-called coarse 
computability and generic computability (see [TO]h We already considered, given some set 
A, the mass problem “coarsely compute A ” that consists of total functions a that coincide 
with xa on a set of density 1. If this problem has a computable solution, i.e., if there exists 
a total computable function a that coincides with \A on a set of density 1, the set A is 
called coarsely computable. 

Generic computability of A is defined in a similar way, but now wrong answers are not 
allowed, and a may be a non-total function. As we have discussed, this does not give a mass 
problem in Medvedev sense directly, but we may consider a mass problem of enumerating the 
graph of a. If this problem is solvable, there exists a computable partial function a defined 
on a set of density 1 that never gives wrong answers. In this case A is called generically 
computable. 

These two notions of approximate computability are incomparable: an enumerable set 
can be coarsely computable but not generically computable and vice-versa (see [10)). 
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Using this language, we can say that Theorem 5.1 is about coarse computability (we get 
there a total approximating function with errors) while Theorem 5.3 can be applied to both 
notions (and even to their combination where we allow partial functions and errors at the 
same time). Now we state two results about the generic model: 


Theorem 5.5. 

(a) Let U be an optimal machine. There is no randomized algorithm which with positive 
probability computes a partial function a that makes no errors for dom (U) and is 
defined on a set of upper density 1. 

(b) Whether dom({7) is (1 — e)-i.o.-generically probabilistically computable or not depends 
on the particular choice of machine U. 


The upper density requirement in (a) means that E(a, dom ([/)) = 0, so this result shows 
that the permission to give (rare) wrong answers was crucial for Theorem |5.1| Note that 
the randomized algorithm in this result is still permitted to make errors (with positive 
probability), but we require that with positive probability it computes a function without 
errors (and with dense domain). 

Proof, (a) Using again the density argument, we may assume without loss of generality that 
a randomized algorithm T computes some partial function with the required properties with 
probability close to 1, say, greater than 0.9. Let Q be the set of good oracles; the measure 
of Q exceeds 0.9 and for every X £ Q the machine T with oracle X computes some partial 
function that has dense domain and makes no errors for dom ( U ). Note that for different 
oracles X E Q we may get different partial functions with these properties. 

We simulate the behavior of T on all possible random bits looking for the oracle prefixes 
that guarantee the algorithm’s answers for most strings of length at most n, for some 
sufficiently large n. More precisely, for some integer d we are looking for triples (n,a, x) 
where n is an integer (length), a is a partial 0-1-valued function on strings of length at 
most n, and x is a bit string (interpreted as an oracle’s prefix), such that 

• n ^ d] 

• a is defined on at least (1 — 2 ~ d ) -fraction of strings of length at most n; 

• x guarantees answers given by a: for every oracle X that extends x , the machine T with 
oracle X computes some extension of a. 

The first requirement (re ^ d) guarantees that the 2 _(i -fraction mentioned in the third 
requirement makes sense for strings of length at most n. For every d and for every oracle 
X in Q there exists some prefix x and some n and a with these properties (since T A has 
domain of upper density 1). By compactness, for every d one can effectively find a finite set 
of triples (rii, ai, x±), ..., (n*,, a*,, xif) such that 

• ni ^ d for all i; 

• the intervals f i Xi (containing extensions of x*) cover more than 90% of the Cantor space; 

• oracle prefix Xi guarantees output a*; 

• ai is defined for a (1 — 2 ~ d )-fraction of all strings of length at most n,. 

One may assume that all intervals Ll Xi are disjoint: we may split large intervals into small 
parts and delete the repetitions. 

Note that there is no guarantee that the answers provided by ai are correct for dom ( U ): 
we know that T produces correct approximations for oracles in Q , but some of the intervals 
Q Xi could be outside Q. Note, however, that if Ll Xi has non-empty intersection with Q, then 
the answers in ai are correct for dom(C/). 
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Now we classify all the triples (■ rii,ai,Xi ) according to n^. the triples that have the 
same n* are put into one group. Consider some group that corresponds to some length 
n. For different triples in this group the answers may be inconsistent. Still, we try to 
guess the number of strings of length at most n in dom ( U ), using some kind of majority 
voting. We count the number of positive answers in a* for each triple in the group. If there 
exists some u such that most triples in the group have the number of positive answers in 
2 n ~ cZ -neighborhood of u, we choose some u with this property and declare it to be our guess 
for length n. In the last sentence the term “most triples” is understood in the measure 
sense: the corresponding intervals Sl Xi should cover more than 50% of the total measure of 
all intervals in the group. 


Lemma 5.6. For at least one group (corresponding to some length n) the guess is made 
and is 0(2 n ~ d ) -close to the number of strings of length at most n in dom(C/). 

Note that for other groups no guess (or a wrong guess) could be made. 


Proof. The set Q has measure greater than 0.9, so its complement has measure less than 0.1. 
On the other hand, the union of all Ll Xi has measure at least 0.9, so at least 8/9 of it consists 
of oracles in Q. So for some length n at least 8/9 of the intervals of the corresponding 
group have non-empty intersection with Q. For these intervals the correct answer belongs to 
the 2 n ~ d neighborhoods of the numbers computed from m, so the guess is made, and it is 
0 (2 n-rf )-close to the correct answer. □ 


Now note that we have described the process that is effective when d is given. Therefore, 
for each length n the guess (if it is made for this length) has conditional complexity (given 
n) at most O(logd). If it is 0(2 n ~ d ) close to the correct number {H n ), then 

C(H n | n) ^ n — d + 0(log d) 


On the other hand, C(H n \ n) ^ n — 0(1), so for large n we get a contradiction. 

(b) The positive part of (b) is true even without randomization, as we have seen in 
Theorem |3.3| (b). The negative part can be proven in the same way as Theorem 
we need a more sophisticated tool instead of a sparse simple set. Here it is. 


3.1 (d), but 


Lemma 5.7. There exists an enumerable set S of density 0 with the following property: for 
every randomized algorithm T the probability of the event ‘T enumerates an infinite subset 
of the complement of S ” is zero. 


The first impression is that the statement of Lemma 5.7 is false. Indeed, if the density 
of S is small, then choosing uniformly a random element in 1... IV for large N , we get an 
element outside S with probability close to 1. Then we repeat this procedure with much 
larger N and much smaller probability of error, etc. If the sum or error probabilities is less 
than 1, in this way we (with positive probability) enumerate an infinite set outside S. 

What is wrong with this argument? We implicitly assumed here that the density 
computably converges to 0 (when choosing N large enough to make error probability small). 
So the set S with required properties may exist (and it does!), though the convergence for 
density cannot be computable. 


Proof of Lemma 5.7 . Due to the Lebesgue density argument, it is enough to show (for some 
enumerable set S) that no randomized algorithm can enumerate an infinite set outside S 
with probability more than 1/2. This statement can be split into a sequence of requirements 
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R e , where R e says that eth randomized algorithm T e fails to do that. And, of course, we 
have density 0 requirement (in addition to all R e ). 

We take care of R e separately for each e, having some process that adds some strings 
to S. Note that the processes for different e may only help each other (the bigger S is, the 
more difficult it is to enumerate an infinite set outside S), except for the density requirement: 
we need to ensure that all the strings added to S by all processes (for all e) still form a set of 
density 0. To take care of densities, let us take a convergent series of rational numbers with 
E = Yle £ e < 1) and agree in advance that eth process may add to S at most £ e -fraction 
of strings of length k for every length k. For example, we may let e e = 2~( e+2 ). This 
guarantees that the density of S does not exceed E. and some additional argument will 
show that it is actually 0. 

Now we describe the eth process (that takes care of R e ). There is a technical problem: 
if £ e is small, the permission to use an £ e -fraction of strings for each length does not allow 
use to use short strings, because even one string added may exceed the threshold. Since we 
are interested in the enumeration of an infinite set outside S, we may ignore short strings 
and start with some length N e where we are allowed to add at least an e e /2-fraction of 
the strings of that length not crossing £ e -threshold. So without loss of generality we may 
assume that T e generates only strings of length at least N e . 

We simulate T e on all oracles to find which strings appear as first elements of the 
enumeration and what are their probabilities. In this way we get a lower semicomputable 
probability distribution on strings of length at least N e . If the total probability (the sum of 
probabilities for all possible output strings) never exceeds 1/2, we do not do anything and 
R e is vacuously satisfied. 

As soon as the sum of probabilities exceeds 1/2, we stop the simulation and choose 
which strings should be added to S. For each n N e we sort all the strings of length n in 
order of decreasing probability and add to S the first \e e /2\2 n of them (so the fraction is at 
least £ e /2 and at most £ e , due to the assumption about N e ). 

What do we achieve in this way? We cover by S some set of strings that have total 
probability (in the T e -enumeration) at least £ e /4. Indeed, for each length we cover at least 
a £ e /4-fraction of total probability for this length (by taking £ e /4-fraction of most probable 
strings), and the sum of these total probabilities for all lengths is at least 1/2. Then we 
increase N e , making it greater than all strings we have added to S, carve out from the 
probability space the part corresponding to covered elements, and repeat the process. Since 
our goal is to prevent enumerating an infinite set outside S, the increase in N e does not 
matter. The second stage (if the probability reaches 1/2 again) carves out additional £ e /4 
from the probability of enumerating an infinite set outside S. (Note that we do not simulate 
r e further for the parts of the probability space already carved out.) Aud so on — the 
number of iterations is bounded, since each time we decrease the probability by at least 
£ e /4, and at some stage the probability never exceeds 1/2, and R e is satisfied. 

It remains to explain why the resulting S (the union of all elements added by all the 
processes) has zero density. Since we add only finitely many elements to S for each of the 
requirements R e , we know that starting from some point, the density of S is bounded by 
the tail of the series ^ e e . This tail can be arbitrarily small, so the density of S is zero, d 


The set provided by Lemma 5.7 allows us to finish the proof of Theorem 5.5 it is 


embedded with positive density in the domain of every effectively optimal machine, and for 




GENERIC ALGORITHMS, HALTING PROBLEM AND OPTIMAL MACHINES 


23 


this part the probabilistic algorithm either makes errors or generates only a sparse set, so 
the density of the domain is separated from 1 if no errors are made. !_! 


6. Busy beavers and fraction of long computations 


We define the “busy beaver” function BB (to) as the maximal running time for terminating 
computations of the optimal machine on strings of length at most to. This functions increases 
very fast (faster than any computable function) and provides some invariant scale for 
measuring the running time of computations in that it does not depend significantly on the 
choice of the optimal machine and computational model used to measure the running time. 
Namely, the following invariant definition is possible: 

Proposition 6.1. Let B(n ) be the maximal integer of complexity at most n; then 

B(n — c) ^ BB (n) ^ B(n + c) 

for some c and for all n. 

(See |22j for the proof and more details.) 

So we know that if we run the optimal machine on all the inputs of size at most n, we 
need to make BB ( n ) steps (for each input) until all the terminating computations terminate. 
A natural question arises: how long should we wait until most of the computations (say, 
except for 2 n ~ k , i.e., 2~ k -fraction) terminate? The following theorem gives an answer with 
logarithmic precision (note that both K(k\n ) and K(n ) are logarithmic in n): 


Theorem 6.2. 

(a) If after t steps at most 2 n ~ k terminating computations are still running, then 

t ^ BB{k~ K(k\n) -0(1)). 

(b) If after t steps more than 2 n ~ k terminating computations are still running, then 

t ^ BB(k + K(n) + 0( 1)). 


Proof, (a) Since BB(i ) (up to 0(l)-change in the argument, see Proposition 6.1) can be 
defined as the maximal number of complexity at most i , we need to show only that for every 
such t we have C(t) ^ k — K(k\n) — 0(1). 

Indeed, to reconstruct H n given n, it is enough to know t and the number N of 
terminating computations (on inputs of length at most n) that are still running after t steps. 
To encode this information, we start by a self-delimited description for k given n (using 
K(k\n) bits), then append N written in binary (using an (n — A:)-bit string), and finally 
append a C(t)- bit description of t. Knowing n and this encoding, we first find k, then read 
the next n — k bits (that form some number U ), use the rest to reconstruct t, and then 
make t steps for each input of length at most n , count the terminated computations and 
add U to get H n . Therefore, K(k \ n) + n — k + C{t) ^ C ( H n \ n) ^ n — 0(1), and we get 
the desired inequality. 

(b) To prove this bound, let us construct a number of complexity at most k + K ( n) 
that exceeds t. Consider the number H n of terminating programs of length at most n, and 
consider its first k bits (i.e., H n where the last n — k bits are replaced by zeros). Knowing 
this string and n, we can wait until that many terminating computations appear, and get 
the number t of steps needed for that. In total it is enough to know K(n ) + k bits, as we 
promised. □ 
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One would like to get rid of the logarithmic terms in these statements. Sometimes it 
is indeed possible. For example, in (b) we can replace K(n) by 0(C (n\k)). Indeed, a 
prefix-free description of a program that maps k to n uses 0(C(n\ k)) bits, and if we append 
the first k bits of H n to it, we can then reconstruct the prefix-free description, then k, then 
n, and finally tj^] This implies, for example, the following corollary: 

Corollary 6.3. Making BB(n/2) steps of computation of the universal machine for each 
input of length at most n, we have 2 n / 2± °W unfinished ( terminating ) computations. 

Still it would be nice to find a more general result with 0(1)-precision (or to show that 
logarithmic terms are unavoidable in the general case), which we leave as an open question. 

7. Some generalizations 

7.1. Beyond the complexity arguments: coarse separation. Most of the arguments 
above are based on the lower bound for the Kolmogorov complexity of C (H n \ n ). Still there 
are natural questions of similar type where the complexity argument does not help us much 
(at least not in an obvious fashion). 

Recall the concept of enumerable inseparable sets, which we already used in Section [2] 
two enumerable sets A and B are inseparable if there is no separator, i.e., no total computable 
function h (defined on strings and taking 0/1-values) such that h(A ) = 0 and h(B) = 1. In 
other words, every computable function makes errors — there are some error points x such 
that either h(x) is undefined, or x 6 A and h(x) = 1, or x £ B and h(x) = 0. 

Here we are interested in the density of the error points. Is it possible that for some 
enumerable inseparable sets A and B there exists a computable (not necessarily total) 
function h for which the error points have density 0? As before, this does happen for 
certain A and B. Assume, for example, that A and B consist only of strings of the form 
0 m : we take two inseparable sets of integers U and V and let A = {0 m : m € U} and 
B = {0 m : m € V}. Obviously, for every total function only strings of type 0 m could be 
error points, and these strings form a set of density 0. To make the question non-trivial, we 
need to consider some standard inseparable sets, as we did before for the halting problem. 

Theorem 7.1. Let U be an effectively optimal machine. Then the sets A = {x: U{x) = 0} 
and B = {x: U (x) = 1} are disjoint and inseparable. Moreover, for every computable 
function h the fraction of errors made by h (as a separator ) on strings of length at most n, 
i.e., 

ff{x : |x| ^ n, (h(x) is undefined ) V (x £ A A h(x) = 1) or (x & B A h(x) = 0)} 


is separated from 0 for large n, i.e., liminf E n > 0. 

(Again we should write 2 n+1 — 1, to be exact, but this does not matter for the limit.) 

Proof. For every computable function V the injective length-bounded reduction ty of V 
to an optimal machine can be found effectively. Indeed, this happens for the “standard” 
effectively optimal machine, and then we can use Schnorr’s isomorphism result to conclude 
that this is true for any effective optimal machine. So we can construct V assuming (in a 

o 

Technical remark: it is important that the description is self-delimiting when k is not known, so this 
argument does not allow us to write K(n\k) instead of 0(C(n\k)). 
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fixed-point way) that the reduction ty is known in advance, and let V{x) = 1 — h(ty(x)). 
Then every point in the image of ty is an error point for h, and they have positive lower 
density because h is injective and length-bounded. □ 

Remark 7.2. In this result the effective optimality is essential. Indeed, one can construct 
an optimal machine U such that U~ 1 ( 0) and f/ _1 (l) each contain only one element. (We 
keep some preimage of 0 and suppress all others; we also do the same thing with 1.) For 
this optimal machine the statement is evidently false. 

7.2. Schnorr-type isomorphism results. Schnorr [20j considered not only effectively 
optimal machines (he called them “optimal enumerations”) but also a similar notion for 
numberings of computable functions. Each computable function [/(•,•) of two arguments 
can be considered as a numbering of some class of computable functions of one argument. 
Namely, for each e we consider the function U e : x e->• U(e,x) and say that U e has number e 
in the U -numbering. (Since we speak about numbers, we assume that both arguments are 
natural numbers, not strings.) 

If every computable function of one argument appears among the U e ’s, the function U 
is universal (for the class of computable functions of one argument). A universal function 
U is called Godel universal function if for every other computable function V(-,-) the 
E-numbering can be reduced to [/-numbering via some total computable function h; this 
means that 

V e = f//i( e ) for all e, i.e., V(e,x) = U(h(e),x) for all e and x. 

The last equation means that V(e,x) and U(h(e), x) are both undefined or both defined 
and equal. 

Requiring the existence of a length-bounded reduction function h, we get the definition 
of a effectively optimal numbering. Since we consider e as an integer, length-bounded 
functions are defined as functions h: N —> N such that h(n) = 0{n). (This is the same 
notion as before if we identify natural number with strings in a standard way.) One can 
prove (in a standard way) that effectively optimal numberings exist; corresponding functions 
of two arguments are also called effectively optimal. 

In [20], Schnorr used the name “optimal numbering” for this notion, but it is natural to 
reserve this name for a weaker notion defined as follows. Every computable function [/(•, •) 
can be used to measure “complexity” of computable functions: if /(•) is some computable 
function, then its complexity Cjj(f) is defined as the logarithm of the minimal [/-number 
of /. (Logarithms of natural numbers correspond to lengths for strings.) Complexity 
Cjj{f ) is infinite if / does not appear among U e . The computable function [/(-, •) and the 
corresponding numbering are called optimal if Cjj is minimal up to 0(1) additive term, i.e., 
for every computable V(-, •) there exists some c such that Cjj(f) ^ Cy{f) + c for all /. 
This implies that Cjj(f ) is finite for every computable /, i.e., that U is universal. 

Now we return to effectively optimal universal functions (numberings). Schnorr proved 
in m that for every two effectively optimal universal functions [/(•,•) and [/'(-, -) there 
exists a computable bijection h that is length-bounded in both directions and U e = U' h ^ 
for all e, i.e., U(e,x) = U'(h(e),x) for all e and x. 

Looking at three similar results (Myhill-Schnorr theorem, Schnorr’s result about optimal 
machines, and just mentioned Schnorr’s result about numberings), it is natural to ask 
whether they can be generalized. Indeed it is possible, in a rather straightforward way. 
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Before explaining this general framework, let us consider one more example: numberings of 
E n -sets * 

Let us fix some n and consider £ n -sets, i.e., sets that can be obtained from a decidable 
predicate by a n-quantifier prefix starting with 3. There exists a universal X n -set of pairs 
(this means that every E n -set can be obtained as its “vertical section”, by fixing the first 
component); such a set determines a numbering of X n -sets. Then we can define optimal 
universal sets (that give minimal complexity function) and effectively optimal universal sets 
(that provide a length-bounded computable reduction for every other E n -set of pairs), and 
prove the existence of effectively optimal universal sets. 

Optimal E n -sets can be used to define a complexity notion for S n -set^J and the “Schnorr 
theorem for E n -sets” then says that every two effectively optimal universal sets differ by a 
computable bijection that is length-bounded in both directions. 

One can provide some general framework for all these examples. Let T be some abstract 
set. We assume that one special element _L E T is fixed, as well as some class of total 
mappings N —> T, called “enumerations”^ We assume that the class of enumerations is 
closed with respect to partial computable reductions: 

(1) if w. N —>• T is an enumeration, and f: N —> N is a partial computable 
function, then the function 

p(n) = if f(n) is defined then ij(f(n)) else _L fi 
is an enumeration, 
and has a maximal element: 

(2) there exists an enumeration v such that for every enumeration p there is 
a total computable h such that p(n) = v(h{ri)) for all n. 

Examples: 

• T = {_L, T}; enumerations correspond to enumerable sets (elements of the set are mapped 
to T, others to J_j^] 

• T = NU { _L} ; enumerations are partial computable functions where undefined values are 
replaced by _L; 

• T is the family of computable partial unary functions; enumerations correspond to 
computable partial binary functions; 

• T is the family of £ n -sets; enumerations correspond to E n -sets of pairs. 

For a given enumeration v, a complexity function is defined: the complexity of t E T is the 
logarithm of the minimal ^-number of t. As usual, identifying integers with strings, we may 
defined complexity as the minimal length of p such that v{p) = t. For the first example 
this notion is meaningless, but for the other three it is reasonable. (We get Kolmogorov 
complexity for the second example and complexity of computable functions as defined by 
Schnorr for the third one.) 

One can prove (by a standard argument) that there exist optimal enumerations that 
make the complexity minimal up to 0(1); one can also strengthen this statement and derive 
from conditions (1) and (2) that there exists an effectively optimal enumeration (every 
enumeration is reducible to it by a total length-bounded function). 

^Similar ideas were considered in [6j. 

°We use the name suggested by Schnorr, though in a more general setting. 

®Here the term “enumeration” sounds confusing, since the enumerable set is the domain of an enumeration, 
not its range. 
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Indeed, let v be a maximal enumeration from (2); the first property gives us an 
enumeration a;(ex) = p([e](x)) where e is the standard self-delimiting encoding of e, and 
[e](x) is the output of program e on input x; if [e](x) is undefined, then cj(x) = X. (Here 
we identify integers with strings and use self-delimiting encoding and concatenations.) The 
enumeration uj is effectively optimal: if p is some other enumeration, then p(x) = i /(/(x)) 
for some total computable /, since v is maximal; if e is the program for /, then p(x) = 
i/([e](x)) = Lo(ex), so x eA ex is a length-bounded reduction. 

To apply Schnorr’s argument in this general framework, one more property is needed: 

(3) if t is an enumeration, and G is an enumerable set of integers, there 

exists an enumeration p such that: 

• if x fi G, then p{x) = r(x); 

• if x G G, then p(x) / _L. 

It is easy to check this property for all the examples above: it says, informally speaking, that 
at any moment we may change an object in T making it different from the “bottom” object 
X. Having this property, we may state a generalized version of Schnorr’s result as follows: 


Theorem 7.3. Assume that a set T and a class of enumerations are fixed that satisfy 
the requirements (l)-(3). Let u and u' be two effectively optimal enumerations. Then 
there exists a computable bijection h that is length-bounded in both directions such that 
v'{x) = is(h(x)) for all x & N. 


Proof. As before, it is enough to show that every enumeration can be reduced to every 
effectively optimal enumeration by a total computable length-bounded injection. The second 
combinatorial part of the proof remains unchanged. Assume that we have an effectively 
optimal (therefore, universal) enumeration w(-) and some other enumeration p(-). We want 
to reduce p to u by some computable injective length-bounded total function. 

First, we construct the uniform size-bounded reduction of all enumerations. The 
property (1) guarantees that there exists an enumeration v such that 

I w([el(x)), if [el(x) is defined; 
zz(ex) = < 

_L otherwise. 


Since u is optimal, there exists a total computable function h{e, x) such that u{ex) = 
a;(/i(e, x)) and \h(e,x)\ ^ \x\ + c e for all x and e. Here c e depends on e (but not on x); in 
fact \h(e, x)| ^ |x| + |e| + 0(1). Recalling the construction of v, we see that 


ui(h(e, x)) 


cu([e](x)), if [e](x) is defined; 
_L otherwise. 


We want to find m such that [m] is total, 

w([m](x)) = p(x) 

for all x, and the function x i-a h(m,x) is injective. Then this function will be the required 
computable injective length-bounded reduction of p to u. 

To find such an m, we construct some total function M(m,x ) and use the fixed-point 
theorem to conclude that M(m,x ) = [m](x) for some m and for all x. This function 
M(m,x ) will have the following properties: 

• if h(m,x) = h{m,x') for some x' < x, then M(m,x ) is some (fixed) preimage of X; 

• if h(m,x) does not appear among h{m,x') for all x' f x, then uj(M(m,x )) = /i(x); 
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• if h(m,x) does not appear among h(m,x') for x' < x, but appears among h(m,x') for 
x' > x , then c jj(M(m,x)) / _L. 

The first condition is computable, so we can start by checking it. Then we need to implement 
a choice between the second and third conditions. The third one defines an enumerable 
set of pairs (m, x) , and we can use the property (3) for this set and the enumeration 
r((m, x)) = p(x) obtained by using (1). 

The fixed point theorem provides some m such that [m](x) = M(m,x) for all x; 
in particular, [m] is total. Let us prove that x H > h(m , x) is a bijection. Assume that 
h(m,x) = h(m,x') for x < x'. Then the first item says that uj(M(m,x')) = _L, so 
u:(h(m,x')) = u([m](x')) = u(M(m,x')) = _L. On the other hand, the pair (m,x) is served 
by the third item, so u(M(m,x)) / _L, and thus ui(h(m,x)) = cj([?n](x)) = u(M(m,x)) / 
T. We get a contradiction with the assumption h(m,x) = h{m,x'). Now we know 
that x i-a h(m,x ) is injective, and u)(M(m,x )) = p(x) according to the second item, so 
uj(h(m, x)) = u)([m](x)) = uj(M(m,x)) = p{x) for all x, and we get an injective reduction 
of p to to. □ 

Remark 7.4. Knowing that effectively optimal numberings (as well as effectively optimal 
machines) are unique up to a length-bounded isomorphism, one can consider — for a given 
function or for a given string — its frequency among the first N functions (or the first N 
outputs of the machine), and then take limsup and liminf of these frequencies. Schnorr’s 
result guarantees that these quantities are well defined (up to a constant factor). 

The lim sup is in fact constant (it is easy to construct some effectively optimal numbering 
or effectively optimal machine with this property, so it is D(l) for every numbering or 
machine). The liminf for decompressors, as Muchnik has shown, equals m° , the relativized 
a priori probability (see [221 Problem 213] and [3]). 

Question: What can we say about limit frequencies for computable functions and/or 
E n -sets? Do we get some kind of relativized complexity as well? 
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